NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction

https://doi.org/10.3847/1538-4365/adb9e3

Wen, Junzhi; Ahmadzadeh, Azim; Georgoulis, Manolis K; Sadykov, Viacheslav M; Angryk, Rafal A (April 2025, The Astrophysical Journal Supplement Series)

Abstract Timely and accurate prediction of solar flares is a crucial task due to the danger they pose to human life and infrastructure beyond Earth’s atmosphere. Although various machine learning algorithms have been employed to improve solar flare prediction, there has been limited focus on improving performance using outlier detection. In this study, we propose the use of a tree-based outlier detection algorithm, Isolation Forest (iForest), to identify multivariate time-series instances within the flare-forecasting benchmark data set, Space Weather Analytics for Solar Flares (SWAN-SF). By removing anomalous samples from the nonflaring class (N-class) data, we observe a significant improvement in both the true skill score and the updated Heidke skill score in two separate experiments. We focus on analyzing outliers detected by iForest at a 2.4% contamination rate, considered the most effective overall. Our analysis reveals a co-occurrence between the outliers we discovered and strong flares. Additionally, we investigated the similarity between the outliers and the strong-flare data and quantified it using Kullback–Leibler divergence. This analysis demonstrates a higher similarity between our outliers and strong-flare data when compared to the similarity between the outliers and the rest of the N-class data, supporting our rationale for using outlier detection to enhance SWAN-SF data for flare prediction. Furthermore, we explore a novel approach by treating our outliers as if they belong to flaring-class data in the training phase of our machine learning, resulting in further enhancements to our models’ performance.
more » « less
Free, publicly-accessible full text available April 1, 2026
Towards Reliable Deep Learning Models for Solar Flare Prediction

https://doi.org/10.22541/essoar.173457205.58483493/v1

Pandey, Chetraj; Angryk, Rafal A; Aydin, Berkay (December 2024, ESS Open Archive)

Full Text Available
Embedding Ordinality to Binary Loss Function for Improving Solar Flare Forecasting

https://doi.org/10.1109/DSAA61799.2024.10722839

Pandey, Chetraj; Ji, Anli; Hong, Jinsu; Angryk, Rafal A; Aydin, Berkay (October 2024, IEEE)

Several natural phenomena, such as floods, earth-quakes, volcanic eruptions, or extreme space weather events often come with severity indexes. While these indexes, whether linear or logarithmic are vital, data-driven predictive models for these events rather use a fixed threshold. In this paper, we explore encoding this ordinality to enhance the performance of data-driven models, with specific application in solar flare forecasting. The prediction of solar flares is commonly approached as a binary forecasting problem, categorizing events as either Flare (FL) or No-Flare (NF) based on a chosen threshold (e.g., >C-class, > M-class, or >X-class). However, this binary formulation overlooks the inherent ordinality between the sub-classes within each binary class (FL and NF). In this paper, we propose a novel loss function aimed at optimizing the binary flare prediction problem by embedding the intrinsic ordinal flare characteristics into the binary cross-entropy (BCE) loss function. This modification is intended to provide the model with better guidance based on the ordinal characteristics of the data and improve the overall performance of the models. For our experiments, we employ a ResNet34-based model with transfer learning to predict 2:M-class flares by utilizing the shape-based features of magnetograms of active region (AR) patches spanning from -90° to +90°of solar longitude as our input data. We use a composite skill score (CSS) as our evaluation metric, which is calculated as the geometric mean of the True Skill Score (TSS) and the Heidke Skill Score (HSS) to rank and compare our models' performance. The primary contributions of this work are as follows: (i) We introduce a novel approach to encode ordinality into a binary loss function showing an application to solar flare prediction, (ii) We enhance solar flare forecasting by enabling flare predictions for each AR across the entire solar disk, without any longitudinal restrictions, and evaluate and compare performance. (iii) Our candidate model, optimized with the proposed loss function, shows an improvement of (~17%, (~14%, and (~13% for AR patches within ±30°, ±60°, and ±90° of solar longitude, respectively in terms of CSS, when compared with standard BCE. Additionally, we demonstrate the ability to issue flare forecasts for ARs in near-limb regions (regions between ±60° to ±90°) with a CSS=0.34 (TSS=0.50 and HSS=0.23), expanding the scope of AR-based models for solar flare prediction. This advances the reliability of solar flare forecasts, leading to more effective prediction capabilities.
more » « less
Full Text Available
Unveiling the Potential of Deep Learning Models for Solar Flare Prediction in Near-Limb Regions

https://doi.org/10.1109/icmla58977.2023.00103

Pandey, Chetraj; Angryk, Rafal A; Aydin, Berkay (December 2023, IEEE)

Full Text Available
Advancing Solar Flare Prediction Using Deep Learning with Active Region Patches

https://doi.org/10.1007/978-3-031-70381-2_4

Pandey, Chetraj; Adeyeha, Temitope; Hong, Jinsu; Angryk, Rafal A; Aydin, Berkay (January 2024, Springer Nature Switzerland)

Full Text Available
Towards Interpretable Solar Flare Prediction with Attention-based Deep Neural Networks

https://doi.org/10.1109/aike59827.2023.00021

Pandey, Chetraj; Ji, Anli; Angryk, Rafal A; Aydin, Berkay (September 2023, IEEE)

Full Text Available
Explainable Deep Learning-Based Solar Flare Prediction with Post Hoc Attention for Operational Forecasting

Pandey, Chetraj; Angryk, Rafal A; Georgoulis, Manolis K; Aydin, Berkay (October 2023, Springer Nature Switzerland)

Full Text Available
Exploring Deep Learning for Full-disk Solar Flare Prediction with Empirical Insights from Guided Grad-CAM Explanations

https://doi.org/10.1109/dsaa60987.2023.10302639

Pandey, Chetraj; Ji, Anli; Nandakumar, Trisha; Angryk, Rafal A; Aydin, Berkay (October 2023, IEEE)

Full Text Available
Measuring Class-Imbalance Sensitivity of Deterministic Performance Evaluation Metrics

https://doi.org/10.1109/ICIP46576.2022.9897445

Ahmadzadeh, Azim; Angryk, Rafal A. (October 2022, 2022 IEEE International Conference on Image Processing (ICIP))

The class-imbalance issue is intrinsic to many real-world machine learning tasks, particularly to the rare-event classification problems. Although the impact and treatment of imbalanced data is widely known, the magnitude of a metric’s sensitivity to class imbalance has attracted little attention. As a result, often the sensitive metrics are dismissed while their sensitivity may only be marginal. In this paper, we introduce an intuitive evaluation framework that quantifies metrics’ sensitivity to the class imbalance. Moreover, we reveal an interesting fact that there is a logarithmic behavior in metrics’ sensitivity meaning that the higher imbalance ratios are associated with the lower sensitivity of metrics. Our framework builds an intuitive understanding of the class-imbalance impact on metrics. We believe this can help avoid many common mistakes, specially the less-emphasized and incorrect assumption that all metrics’ quantities are comparable under different class-imbalance ratios.
more » « less
Full Text Available
Towards coupling full-disk and active region-based flare prediction for operational space weather forecasting

https://doi.org/10.3389/fspas.2022.897301

Pandey, Chetraj; Ji, Anli; Angryk, Rafal A.; Georgoulis, Manolis K.; Aydin, Berkay (August 2022, Frontiers in Astronomy and Space Sciences)

Solar flare prediction is a central problem in space weather forecasting and has captivated the attention of a wide spectrum of researchers due to recent advances in both remote sensing as well as machine learning and deep learning approaches. The experimental findings based on both machine and deep learning models reveal significant performance improvements for task specific datasets. Along with building models, the practice of deploying such models to production environments under operational settings is a more complex and often time-consuming process which is often not addressed directly in research settings. We present a set of new heuristic approaches to train and deploy an operational solar flare prediction system for ≥M1.0-class flares with two prediction modes: full-disk and active region-based. In full-disk mode, predictions are performed on full-disk line-of-sight magnetograms using deep learning models whereas in active region-based models, predictions are issued for each active region individually using multivariate time series data instances. The outputs from individual active region forecasts and full-disk predictors are combined to a final full-disk prediction result with a meta-model. We utilized an equal weighted average ensemble of two base learners’ flare probabilities as our baseline meta learner and improved the capabilities of our two base learners by training a logistic regression model. The major findings of this study are: 1) We successfully coupled two heterogeneous flare prediction models trained with different datasets and model architecture to predict a full-disk flare probability for next 24 h, 2) Our proposed ensembling model, i.e., logistic regression, improves on the predictive performance of two base learners and the baseline meta learner measured in terms of two widely used metrics True Skill Statistic (TSS) and Heidke Skill Score (HSS), and 3) Our result analysis suggests that the logistic regression-based ensemble (Meta-FP) improves on the full-disk model (base learner) by ∼9% in terms TSS and ∼10% in terms of HSS. Similarly, it improves on the AR-based model (base learner) by ∼17% and ∼20% in terms of TSS and HSS respectively. Finally, when compared to the baseline meta model, it improves on TSS by ∼10% and HSS by ∼15%.
more » « less
Full Text Available

« Prev Next »

Search for: All records